A Cost-efficient Rewriting Scheme to Improve Restore Performance in Deduplication Systems

نویسندگان

  • Jie Wu
  • Yu Hua
  • Pengfei Zuo
  • Yuanyuan Sun
چکیده

In chunk-based deduplication systems, logically consecutive chunks are physically scattered in different containers after deduplication, which results in the serious fragmentation problem. The fragmentation significantly reduces the restore performance due to reading the scattered chunks from different containers. Existing work aims to rewrite the fragmented duplicate chunks into new containers to improve the restore performance, which however produces the redundancy among containers, decreasing the deduplication ratio and resulting in redundant chunks in containers retrieved to restore the backup, which wastes limited disk bandwidth and decreases restore speed. To improve the restore performance while ensuring the high deduplication ratio, this paper proposes a cost-efficient submodular maximization rewriting scheme (SMR). SMR first formulates the defragmentation as an optimization problem of selecting suitable containers, and then builds a submodular maximization model to address this problem by selecting containers with more distinct referenced chunks. We implement SMR in the deduplication system, which is evaluated via two real-world datasets. Experimental results demonstrate that SMR is superior to the state-of-the-art work in terms of the restore performance as well as deduplication ratio. We have released the source code of SMR for public use.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Restore and Garbage Collection in Deduplication-based Backup Systems via Exploiting Historical Information

In deduplication-based backup systems, the chunks of each backup are physically scattered after deduplication, which causes a challenging fragmentation problem. The fragmentation decreases restore performance, and results in invalid chunks becoming physically scattered in different containers after users delete backups. Existing solutions attempt to rewrite duplicate but fragmented chunks to im...

متن کامل

ALACC: Accelerating Restore Performance of Data Deduplication Systems Using Adaptive Look-Ahead Window Assisted Chunk Caching

Data deduplication has been widely applied in storage systems to improve the efficiency of space utilization. In data deduplication systems, the data restore performance is seriously hindered by read amplification since the accessed data chunks are scattered over many containers. A container consisting of hundreds or thousands data chunks is the data unit to be read from or write to the storage...

متن کامل

An Optimization of Backup Storage using Backup History and Cache Knowledge in reducing Data Fragmentation for In_line deduplication in Distributed

The chunks of data that are generated after the backup are physically distributed after deduplication in backup system, which creates a problem know as fragmentation. Basically fragmentation basically comes into sparse and outof-order containers. The sparse container adversely affect the performance while restoring the database and garbage collection effectively , while the out-of-order contain...

متن کامل

Design Tradeoffs for Data Deduplication Performance in Backup Workloads

Data deduplication has become a standard component in modern backup systems. In order to understand the fundamental tradeoffs in each of its design choices (such as prefetching and sampling), we disassemble data deduplication into a large N-dimensional parameter space. Each point in the space is of various parameter settings, and performs a tradeoff among backup and restore performance, memory ...

متن کامل

AN EFFICIENT METHOD FOR OPTIMUM PERFORMANCE-BASED SEISMIC DESIGN OF FUSED BUILDING STRUCTURES

A dual structural fused system consists of replaceable ductile elements (fuses) that sustain major seismic damage and leave the primary structure (PS) virtually undamaged. The seismic performance of a fused structural system is determined by the combined behavior of the individual PS and fuse components. In order to design a feasible and economic structural fuse concept, we need a procedure to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017